Video-based pulse measurement

ABSTRACT

Aspects of the subject disclosure are directed towards a video-based pulse/heart rate system that may use motion data to reduce or eliminate the effects of motion on pulse detection. Signal quality may be computed from (e.g., transformed) video signal data, such as by providing video signal feature data to a trained classifier that provides a measure of the quality of pulse information in each signal. Based upon the signal quality data, corresponding waveforms may be processed to select one for extracting pulse information therefrom. Heart rate data may be computed from the extracted pulse information, which may be smoothed into a heart rate value for a time window based upon confidence and/or prior heart rate data.

BACKGROUND

Heart rate is considered one of the more important and well-understoodphysiological measures. Researchers in a variety of fields havedeveloped techniques that measure heart rate as accurately andunobtrusively as possible. These techniques enable heart ratemeasurements to be used by applications ranging from health sensing togames, along with interfaces that respond to a user's physical state.

One approach to measuring heart rate unobtrusively and inexpensively isbased upon extracting pulse measurements from videos of faces, capturedwith an RGB (red, green, blue) camera. This approach found thatintensity changes due to blood flow in the face was most apparent in thegreen video component channel, whereby this green component was used toextract estimates of pulse rate.

Existing video-based techniques are not robust, however. For example,the above technique based upon the green channel needs a very stableface image. Indeed, existing approaches (including those in deployedproducts) do not work well with even relatively slight levels of usermovement and/or with variation in ambient lighting.

SUMMARY

This Summary is provided to introduce a selection of representativeconcepts in a simplified form that are further described below in theDetailed Description. This Summary is not intended to identify keyfeatures or essential features of the claimed subject matter, nor is itintended to be used in any way that would limit the scope of the claimedsubject matter.

Briefly, various aspects of the subject matter described herein aredirected towards a video-based pulse measurement technology that in oneor more aspects operates by computing pulse information from videosignals of a subject captured by a camera over a time window. Thetechnology includes processing signal data that contains the pulseinformation and that corresponds to at least one region of interest ofthe subject. The pulse information is extracted from the signal data,including by using motion data to reduce or eliminate effects of motionwithin the signal data. In one or more aspects, at least some of themotion data may be obtained from the video signals and/or from anexternal motion sensor.

One or more aspects include a signal quality estimator that isconfigured to receive candidate signals corresponding to a plurality ofcaptured video signals of a subject. For each candidate signal, thesignal quality estimator determines a signal quality value that is basedat least in part upon the candidate signal's resemblance to pulseinformation. A heart rate extractor is configured to compute heart ratedata corresponding to an estimated heart rate of the subject based atleast in part upon the quality values.

One or more aspects are directed towards providing sets of feature datato a classifier, each set of feature data including feature datacorresponding to video data of a subject captured at one of a pluralityof regions of interest. Quality data is received from the classifier foreach set of feature data, the quality data providing a measure of pulseinformation quality represented by the feature data. Pulse informationis extracted from video signal data corresponding to the video data ofthe subject, including by using the quality data to select the videosignal data. The feature data may include motion data as part of thefeature data for each set.

Other advantages may become apparent from the following detaileddescription when taken in conjunction with the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitedin the accompanying figures in which like reference numerals indicatesimilar elements and in which:

FIG. 1 is a block diagram illustrating example components that may beused in video based pulse measurement for heart rate detection,according to one or more example implementations.

FIG. 2 is a block diagram illustrating example components and data flowoperations that may be used in video based pulse measurement for heartrate detection, according to one or more example implementations.

FIG. 3 is an example representation of region of interest detection andprocessing for a plurality of video-captured regions, according to oneor more example implementations.

FIG. 4 is a block diagram showing example processing operations andexample output at each such processing operation, according to one ormore example implementations.

FIGS. 5A-5C are example representations of various aspects of motionfiltering with respect to video-based pulse measurement, according toone or more example implementations.

FIGS. 6A-6C are example representations of feature extraction fromsignals showing normalized autocorrelation versus time for use inselecting signals for video-based pulse measurement, according to one ormore example implementations.

FIG. 7A provides example representations of power spectra from selectedcomponents and corresponding values of peak confidence, according to oneor more example implementations.

FIG. 7B is an example representation of waveforms in whichclassifier-provided confidence values are overridden by spectral peakconfidence values with respect to selection, according to one or moreexample implementations.

FIGS. 8 and 9 comprise a flow diagram illustrating example steps thatmay be taken to determine heart rate from video signals according to oneor more example implementations.

FIG. 10 is a block diagram representing an example non-limitingcomputing system or operating environment into which one or more aspectsof various embodiments described herein can be implemented.

DETAILED DESCRIPTION

Various aspects described herein are generally directed towards a robustvideo-based pulse measurement technology. The technology is based inpart upon video signal quality estimation including one or moretechniques for estimating the fidelity of a signal to obtain candidatesignals. Further, given one or more signals that are candidates forextracting pulse and the quality estimation metrics, described are oneor more techniques for extracting of heart rate from those signals in amore accurate and robust manner relative to prior approaches. Forexample, one technique compensates for motion of the subject based uponmotion data sensed while the video is being captured.

Still further, temporal smoothing is described, such that given a seriesof heart rate values following extraction, (e.g., thirty seconds ofheart rate values that were recomputed every second), described are waysof “smoothing” the heart rate signal/values into a measurement that issuitable for application-level use or presentation to a user. Forexample, data that indicate a heart rate that changes in a way that isnot physiologically plausible may be discarded or otherwise have alowered associated confidence.

It should be understood that any of the examples herein arenon-limiting. For example, the technology is generally described in thecontext of heart rate estimation from video sources, however,alternative embodiments may apply the technology to other sources ofheart rate signals. Such other source may include photoplethysmograms(PPGs, as used in finger pulse oximeters and heart-rate-sensingwatches), electrocardiograms (ECGs), or pressure waveforms. Thus, the“candidate signals” referred to herein may include signals from one ormore sensors (e.g., a red light sensor, a green light sensor, and apressure sensor under a watch) or one or more locations (e.g., twodifferent electrical sensors). A motion signal may be derived from anaccelerometer in some situations, for example.

Further, while face tracking is one technique, another physiologicallyrelevant region (or regions) of interest may be used. For example, thevideo signals or other sensor signals may be one or more patches of asubject's skin and/or eye.

As such, the present invention is not limited to any particularembodiments, aspects, concepts, structures, functionalities or examplesdescribed herein. Rather, any of the embodiments, aspects, concepts,structures, functionalities or examples described herein arenon-limiting, and the present invention may be used various ways thatprovide benefits and advantages in heart rate estimation and signalprocessing in general.

FIG. 1 is a block diagram showing one suitable implementation of thetechnology described herein. A camera 102 captures signals such asframes of RGB data of a human subject 104; other color schemes may beused, as may non-visible light frequencies such as infrared (IR). Avideo-based pulse measurement system 106 processes the received signalinformation and outputs suitable data, such as a current heart rate atregular intervals, to a program 108 such as an application, service orthe like. For example, such an application may be running on a personalcomputer, smartphone, tablet computing device, handheld computingdevice, smart television, standalone device, exercise equipment, medicalmonitoring device and so on. Note that as indicated via the dashed arrowin FIG. 1, the program 108 may provide data to the video-based pulsemeasurement system 106, e.g., parameters such as a time window, qualityand/or confidence thresholds, smoothing constraints, capabilities of theprogram, and so on. In this way, for example, an application in a pieceof exercise equipment may operate in a different way than a gameapplication that counts calories burned, for example.

Within the exemplified video-based pulse measurement system 106, anumber of components may be present, such as generally arranged in aprocessing pipeline in one or more implementations. The components,which in this example include a signal quality estimator 110, a heartrate extractor 112 and a smoothing component 114, may be standalonemodules, subsystems and so forth, or may be component parts of a largerprogram. Each of the components may include further components, e.g.,the signal quality estimator 110 and/or the heart rate extractor 112 mayinclude motion processing logic. Further, not all of the components maybe present in a given implementation, e.g., smoothing need not beperformed, or may be performed external to the video-based pulsemeasurement system 106. Additional details related to signal qualityestimation, heart rate extraction and smoothing are provided below.

FIG. 2 is a general block diagram illustrating example components of oneembodiment of a video-based pulse measurement system (such as the system106 of FIG. 1). As is understood, the exemplified implementation ofFIGS. 1 and 2 is based upon a combination of signal quality estimation,heart rate extraction and/or temporal smoothing.

In FIG. 2, an input video signal 222, which for example may contain RGBand/or infrared (IR) components, is provided to a face trackingmechanism 224. In general, the face tracking mechanism 224 locates andtracks one or more regions of interest, such as the face itself, thecheeks and so on. However as is understood, this is only one example, asany place other than the face where skin may be sensed (instead of or inaddition to the face) may be selected as a region of interest, as maynon-skin regions such as the eye or part of the eye. Note that knownprior approaches sensed the whole face.

Region of interest tracking is generally exemplified as face tracking330 in FIG. 3, in which regions of interest ROI 1, ROI 2 and ROI 3provide R, G and B signals 332 for each region. In this example, a localaverage or the like may be computed from each ROI and each colorchannel, resulting in a total of nine intensity values (three regions bythree component values) per frame. Note that this is only one example,and candidate signals need not be one-dimensional; for example, thetechnology/heuristics may be applied to the combined RGB signal insteadof the individual RGB components. Note that it is feasible to usemultiple cameras, which may be of the same type (e.g., RGB cameras) or amix of camera types, (e.g., RGB and IR cameras)/

Conventional computer vision algorithms may be used to provide a facedetector that yields approximate locations of the face (square) and thebasic features (eyes, nose, and mouth) in each frame. However, inaddition to the whole face (ROI 1), in the example of FIG. 3 the cheekregions are also extracted from each frame (ROIs 2 and 3). The cheekstend to be useful because they are predominantly soft tissue thatexhibit significant pulsatile changes with blood flow. This data may beband-pass smoothed e.g., with a second-order Butterworth filter with apass band between 0.75 and 4 Hz, corresponding to 45-240 beats perminute. Note that the whole face may be considered a region of interest,and as shown in FIG. 3, regions of interest may overlap.

Returning to FIG. 2, the signals corresponding to the tracked regionsmay be transformed by a suitable transform 226 such as independentcomponent analysis (ICA) or principal component analysis (PCA). Thisresults in one or more candidate pulse signals 228.

The one or more candidate pulse signals 228 along with any relatedfeatures may be processed (e.g., by a classifier/scorer) to obtainsignal quality metrics 230 for each candidate signal, which may becombined or otherwise processed into summary quality metric data 232 foreach candidate signal, as described below. Candidate filtering 234 maybe used to select the top k (e.g., the top two) candidates based upontheir quality values, which may be transformed into a power spectrum 236for each candidate signal. As described herein, peak signals in thepower spectrum 236 that may represent a pulse, but alternatively may becaused by motion of the subject, may be eliminated or at least loweredin quality estimation during heart rate estimation by the use of asimilar motion power spectrum.

In general, the signal quality estimator 110 (FIG. 1) takes candidatesignals that may contain information about pulse and determines theextent to which each candidate signal actually contains pulseinformation (providing a quality estimate). As one non-limiting example,a candidate signal may, for example, correspond to some number (e.g.thirty seconds) of data from just the green channel from a camera from aparticular region of the image (e.g. the entire face, one cheek, and soforth, averaged down to one continuous signal. Two other non-limitingexamples of candidate signals may be average values for some number(e.g. thirty seconds) of data from the red and blue channels,respectively. Still non-limiting examples are based upon some number(e.g. thirty seconds) of data from a transformation of the RGB signalfrom a region, e.g., the nine principal component vectors of the averageRGB signals from three regions; each of the nine component vectors maybe one candidate signal.

Signal quality estimation basically determines how much each of thesecandidate signals contains information about pulse. Various metrics orfeatures may be used for estimating signal quality, and any number ofsuch metrics may be put together into a classification or regressionsystem to provide a unified measure of signal quality. Note that thesemetrics may be applied to each candidate signal separately.

In one or more implementations, the metrics are typically computed onwindows of every candidate signal source, for example the last thirtyseconds of the R, G, and B channels, recomputed every five seconds.However they may alternatively be run on an entire video or on veryshort segments of data.

Metrics for signal quality may include various features for signalquality from the autocorrelation of the signal. The autocorrelation is astandard transformation in signal processing that helps measure therepetitiveness of a signal. The autocorrelation of a one-dimensionalsignal produces another one-dimensional signal. The number of peaks inthe autocorrelation and the magnitude of the first prominent peak in theautocorrelation are computed, (where “prominent” may be defined by athreshold height and a threshold distance from other peaks), along withthe mean and variance of the spacing between peaks in theautocorrelation. Note that these are only examples of some usefulautocorrelation-based features. Any number of heuristics related torepetitiveness that are derived from the autocorrelation may be used inaddition to or instead of those described above.

Other features for signal quality may be derived, such as statistics onthe time-domain signal itself, e.g. kurtosis, variance, number of zerocrossings. Kurtosis is a useful time-domain statistic.

Still other features for signal quality may be derived by comparing thesignal to a template of what known pulse signals look like, e.g. bycross-correlation or dynamic time warping. Pulse signals tend to have acharacteristic shape that is not perfectly symmetric and does not looklike typical random noise, and the presence or absence of this patternmay be exploited as a measure of quality. High correlation with a pulsetemplate is generally indicative of high signal quality. This can bedone using a static dictionary of pulse waveforms, or using a dynamicdictionary, e.g., populated from recent pulses observed in the currentdata stream that are assigned high confidence by other metrics.

Other features for signal quality may be derived from the power spectrumof the candidate signal. In particular, the power spectrum of a signalthat represents heart rate tends to show a single peak around the heartrate. One implementation thus computes the magnitude ratio of thelargest peak in the range of human heart rates to the second-largestpeak, referred to as “spectral confidence.” If the largest peak is muchlarger than the next-largest-peak, this is indicative of high signalquality. The spectral entropy of the power spectrum, a standard metricused to describe the degree to which a spectrum is primarilyconcentrated around a single peak, may be similarly used for computing aspectral confidence value.

The following is a non-limiting set of signal data/feature data that mayinform signal quality estimation, some or all of which may be fed intothe classifier/scorer:

-   -   1) Motion information (from video or external, e.g., inertial        sensors)    -   2) Light information from outside the ROI, either from other        parts of the video signal and/or from a separate video/ambient        light sensor    -   3) Previous observed heart rates    -   4) Distance between the camera and the user    -   5) Activity level (from motion, skeleton tracking, etc.)    -   6) Demographic information: height, weight, age, gender, race        (particularly skin tone)    -   7) Temperature    -   8) Humidity    -   9) Other derived visual properties of the ROI, e.g. hairiness,        sweatiness

Each of the metrics described herein may provide an independent estimateof how much a candidate signal contains information about pulse. Tointegrate these together into a single quality metric for a candidatesignal, a supervised machine learning approach may be used, for example.In one example embodiment, these metrics are computed for everycandidate signal in every thirty second window in a “training data set”,for which there is an external measure of the true heart rate (e.g.,from an electrocardiogram). For each of those candidate signals, a humanexpert also may rate the candidate signal for its quality, and/or thesignal is automatically rated by running a heart rate extraction processon the signal and comparing the result to the true heart rate. This isthus a very typical supervised machine learning problem, namely that amodel is trained to take those metrics and predict signal quality givennew data (for which the “true” heart rate is not known). The model maybe continuous (producing an estimate of overall signal quality) ordiscrete (labeling the signal as “good” or “bad”). The model may be asimple linear regressor (as described in one example herein), or may bea more complex classifier/regressor (e.g. a boosted decision tree,neural network, and so forth).

With respect to heart rate estimation, given the candidate signals thatmay contain information about pulse, and the quality metrics for eachsignal, a next step in one embodiment is to determine the actual heartrate represented by some window of time, for which there may be multiplecandidate heart rate signals. Another possible determination is that noheart rate can be extracted from this window of time.

Various techniques for extracting heart rate are described herein; notethat these are not mutually exclusive. The exemplified techniquesgenerally build on the basic approach of taking a Fourier (or wavelet)transform of a signal and finding the highest peak in the correspondingspectrum, within the range of frequencies corresponding to reasonablehuman heart rates.

Candidate filtering 234 is part of one method for estimating a heartrate, so as to choose one or more of the candidate signals for heartrate extraction. In one embodiment, candidate signals are rankedaccording to the quality score assigned in the prior phase, using amachine learning system to integrate the quality metrics into a singlequality score for each candidate signal. Only the top k (e.g., the toptwo) signals, as ranked by the supervised classification system, areselected for further examination.

Given multiple possible peaks in the power spectrum 236 of a candidatesignal that may correspond to heart rate, a conventional approach is toassume that the largest peak corresponds to heart rate. However, even ifface tracking is used to define the region of interest so that in theorya moving face does not introduce motion artifact into the candidateheart rate signals, some amount of motion artifact virtually alwaysremains in candidate signals. As a result, motion may remain a challengefor estimating heart rate from video streams. For example, even if asignal is pre-processed to minimize the effects of motion, some amountof motion is likely to remain in the candidate signals, and motion of aface is often very close in frequency to a human heart rate (about 1Hz).

Thus, as described herein, motion may be estimated such as by a motioncompensator 238 (computation mechanism) of FIG. 2 and used to suppress(e.g., eliminate or reduce the quality score of) heart rate signals thatare likely to actually be motion-generated. More particularly, otherfeatures for signal quality may be derived by comparing the signal to anestimate of the motion pattern in the video from which these signalswere derived, e.g. computed from the optical flow in the video stream orvia face tracker output coordinates. Note however that motion signalsmay be sensed in many ways, including via an accelerometer, and any wayor combination of ways of obtaining a reasonable motion power spectrum240 may be used.

In general, if a candidate signal is very similar to the motion pattern(as computed by cross-correlation, for example), the candidate signal isstatistically less likely to contain information about pulse, which maybe used to lower its quality score as described herein. Such templatesneed not be only based on time, but also on space, as a true pulsesignal does not appear uniformly across the face, as a pulse progressesacross the face in a consistent pattern (which may vary from person toperson) that relates to the density of blood vessels in different partsof the face and the orientation of the larger blood vessels deliveringblood to the face. Consequently, a high correlation of the fullspace-time sequence of images with a known space-time template isindicative of high signal quality.

To obtain the motion power spectrum, the motion compensator 238 providesthe motion power spectrum 240, which is generally used to assist indetecting when a person's coincidental movement may be causing the inputvideo signal 222 to resemble a pulse. In other words, data (e.g., atransform) corresponding to the movement such as the power spectrum 240of the motion signal may be used to lower the quality score (and thuspotentially eliminate) one or more of the candidate signals 228 thatlook like quality pulse signals but are instead likely to be caused bythe subject's motion. Note that the motion compensator 238 may be basedupon determining motion from the video, and/or from one or more externalmotion sensors 116 (FIG. 1) such as an accelerometer.

In one implementation, the power spectrum of the motion signal may beused for motion peak suppressor (block 246), such as to a assign a lowerweight to peaks in the power spectrum of the candidate heart rate signalthat align closely with peaks in the power spectrum of the motionsignal. That is, the system may pick a peak that is not the largest peakin the spectrum of the candidate signal, if that largest peak aligns tooclosely with probable motion frequencies.

Typically there are multiple candidate signals that were not filteredout in the filtering stage. Each remaining candidate signal has a powerspectrum 248 that has been adjusted for similarity to the motionspectrum. To choose a final heart rate, one implementation uses aweighted combination of the overall quality estimate of each remainingcandidate and the prominence of the peak that is believed to representthe heart rate in each of the chosen signals. Candidates with highsignal quality and prominent heart rate peaks are preferred overcandidates with lower signal quality and less prominent heart ratepeaks, (where prominence is defined as a function of the distance toother peaks and the amplitude relative to adjacent valleys in the powerspectrum 248).

At this stage, a candidate heart rate is selected, as shown via block250 of FIG. 2. Using one or more of the quality metrics the system maydecide that even the best heart rate signal is not of sufficient qualityto report to an application or to a user, and this entire frame may berejected, (e.g., the system outputs “heart rate not available” of thelike). The quality metrics also may be provided to an application thatis consuming the final heart rate signal, as applications may beinterested in the quality metrics, for example to place more or lessweight on a particular heart rate estimate when computing a user'scaloric expenditure.

Temporal smoothing 252, such as based on the summary quality metric data232, also may be used as described herein. For example, when an estimateof the current heart rate for a particular window in time is available,the estimates may vary significantly from one window to the next as aresult of incorrect predictions. By way of example, a sequence ofestimates separated by ten seconds each may be [70 bpm, 71 bpm, 140 bpm,69 bpm] (where bpm is beats per minute). In this example, it is verylikely that the estimate of 140 bpm was an error. As can be readilyappreciated, reporting such rapid, unrealistic changes in heart ratethat are likely errors is undesirable.

Described herein are example techniques for “smoothing” the series ofheart rate estimates, including smoothing by dynamic programming andconfidence-based weighting; note that these techniques are not mutuallyexclusive, and one or both may be used separately, together with oneanother, and/or with one or more other smoothing techniques.

With respect to smoothing by dynamic programming, the system likelystill has multiple candidate peaks in the power spectrum that mayrepresent heart rate (from multiple candidate signals and/or multiplepeaks in each candidate signal's power spectrum). As described above, inone embodiment a single final heart rate estimate was chosen. As analternative to choosing a single heart rate, a list or the like of thecandidate heart rate values at each window in time may be maintained,with each value associated with a confidence score, (e.g., a combinationof the signal quality metric for the candidate signal and the prominenceof the peak itself in the power spectrum), with a dynamic programmingapproach used to select the “best series” of candidates across manywindows in a sequence. The “best series” may be defined as the one thatpicks the heart rate values having the most confidence, subject topenalties for large, rapid jumps in heart rate that are notphysiologically plausible.

With respect to confidence-based weighting, another approach tosmoothing the series of heart rate measurements is to weight newestimates according to their confidence. A very high confidence score ina new estimate, possibly as high as one-hundred percent, may be used asa threshold for reporting that estimate right away. If there is moreconfidence in previous measurements than in the current measurement, thecurrent and previous estimates may be blended according to the currentconfidence values and/or previous confidence values, for example as alinear (or other mathematical) combination weighted by confidence.Consider that the current heart rate estimate is h(t), the previousheart rate estimate is h(t−1), the current confidence value is α(t), andthe previous confidence value is α(t−1). The following are some exampleschemes for confidence-based selection of the final reported heart rateh′(t).

Weight only according to current confidence:

h′(t)=α(t)h(t)+(1−α(t))h(t−1)

Weight according to current and previous confidences

${h^{\prime}(t)} = {{\frac{\alpha (t)}{{\alpha (t)} + {\alpha \left( {t - 1} \right)}}{h(t)}} + {\frac{\alpha \left( {t - 1} \right)}{{\alpha (t)} + {\alpha \left( {t - 1} \right)}}{h\left( {t - 1} \right)}}}$

The above temporal smoothing is based upon using known physiologicalconstraints (e.g., a heart rate can only change so fast) along withother factors related to signal quality, to more intelligently integrateacross heart rate estimates that do not always agree. Such knownphysiological constraints can be dynamic, and can be informed bycontext. For example, a subject's heart rate is likely to change morerapidly when the subject is moving a lot, whereby information from amotion signal (coming from video and/or from an inertial sensor such asin a smartphone or watch) can inform the temporal smoothing method. Forexample, what is considered implausible for a person who is relativelystill may not be considered implausible for a person who is rapidlychanging motions.

The above technology has thus far been described in the context of heartrate estimation from video sources. However, alternative embodiments mayapply these techniques to other sources of heart rate signals, such asphotoplethysmograms (PPGs, as used in finger pulse oximeters andheart-rate-sensing watches), electrocardiograms (ECGs), or pressurewaveforms. In these scenarios, the candidate signals may be signals fromone or more sensors (e.g. a red light sensor, a green light sensor, anda pressure sensor under a watch) or one or more locations (e.g. twodifferent electrical sensors). The motion signal may be derived from anaccelerometer or other such inertial sensor in such cases, for example.

FIGS. 4 and 5 are directed towards additional details of an exampleimplementation that achieves robust heart rate estimation throughoperations applied sequentially on video, (of regions of the face inthis example). Such operations are shown in FIG. 4, and includeregion-of-interest detection and processing 442, signal separation andmotion filtering 444, component selection 446 and heart rate estimation448.

Micro-fluctuations due to blood flow in the face form temporallycoherent sources due to their periodicity. A signal separation algorithmsuch as ICA is capable of separating the heart rate signal from othertemporal noise such as intensity changes due to motion or environmentalnoise. In the exemplified implementation of FIG. 4, the red, green, andblue channels of the camera are treated as three separate sensors thatrecord a mixture of signals originating from multiple sources.

ICA is well known for finding underlying factors from multi-variatestatistical data, and may be more appropriate than methods likePrincipal Component Analysis (PCA). Notwithstanding, if a transformationis used, any suitable transformation may be used.

Applying region detection on N frames yielded an input data matrix X, ofsize 9×N, which can be represented as

X=AS  (1)

where A is the matrix that contains weights indicating linearcombination of multiple underlying sources contained in S. The S matrixof size 9×N contains the separated sources (called components), any one(or combination) of which may represent the signal associated with thepulse changes on the face. One implementation utilized the JointApproximate Diagonalization of Eigenmatrices (JADE) algorithm toimplement ICA. Note that forcing the number of output components to beequal to number of input mixed signals represents a dense model thathelps separate unknown sources of noise with good accuracy.

With respect to motion filtering, natural head movements associated withdaily activities such as watching television, performing desk work orexercising can significantly affect the accuracy of camera-based heartrate measurement. Longer periodic motions need to be considered; forexample, changes in the position and intensity of specular and diffusereflections on the face change while running or biking indoors as wellas aperiodic motions, e.g., rapid head movements when switching gazebetween multiple screens, to other objects in the environment or lookingaway from a screen.

Periodic motions cause large, temporally-varying color and intensitychanges that are easily confused with variations due to pulse. Thismanifests itself as a highly correlated ICA component that capturesmotion-based intensity changes at multiple locations on the face. Asfacial motions often occur at rates in the same range of frequencies ofheart rate, they cannot be ignored. An example is generally representedin FIGS. 5A-5C, which represent an example of motion filtering usinglarge periodic motion. FIG. 5A shows three frames with different headpositions and normalized head translation vectors derived from facetracking coordinates; FIG. 5B represents time domain signals for aselected heart rate signal (HR) and motion component (M) having acorrelation with FIG. 5A equal to 0.89. FIG. 5C shows the power spectrumof the selected component with two peaks at heart rate and motionfrequencies.

One or more implementations are directed toward solving themotion-related problems by tracking the head, in that that head motionmay closely correlate with changes in the intensity of light reflectedfrom the skin when a person's head is in motion. The 2-D coordinatesindicating the face location (mean of top-left and bottom-right) may beused to derive an approximate value for head motion between subsequentframes (FIG. 5A). The total amount of head activity between twosubsequent frames may be estimated using the partial derivative of thecentroid of the face location with respect to frame number:

$\begin{matrix}{{\Delta \; a_{n}} = {{\frac{\partial\;}{\delta \; n}\left( \sqrt{{\overset{\_}{x}}_{n}^{2} + {\overset{\_}{y}}_{n}^{2}} \right)}}} & (2) \\{{{a(t)} = {\sum\limits_{n = 1}^{w}{\Delta \; a_{n}}}},} & (3)\end{matrix}$

where α(t) represents the head activity within a window. Oneimplementation empirically selected a window size w of 300 frames (10seconds), as a smallest window feasible for heart rate detection. Thismetric may be used to automatically label each window as either motionor rest. A static threshold of twenty percent of the face dimension(length or width in pixels) was used for labeling windows. For example,if a face region is 200×200 pixels, the motion threshold for aten-second window is set to 400 (0.2×200 pixels×10 sec). If the totalhead translation α(t) is greater than 400 pixels (over the 10 secondwindow), the window is labeled as motion. These labels guide theprocessing and assist in heart rate estimation. For example, the heartrate is expected to be higher during periods of exercise (motion) thanduring rest periods.

By way of example, motion filtering us generally represented in FIGS.5A-5C using an example with large periodic motion. FIG. 5A shows threeframes with different head positions and normalized head translationvectors derived from face tracking coordinates. FIGS. 5B and 5C showtime domain signals for the selected signal and motion component, havingcorrelation=0.89 with FIG. 5A, and the power spectrum of the selectedcomponent with two peaks at heart rate (HR) and motion (M) frequencies.

In this example, FIG. 5A illustrates approximate head motion values withthe threshold set at 380 (face size 190×190 pixels), while a useralternates between blocks of cycling on an exercise bike and sittingstill. The heart rate is expected to be higher during periods ofexercise (motion) than the rest periods as illustrated in FIGS. 5B and5C by corresponding heart rate (HR) estimates from the camera and theoptical sensor. The heart rate drops rapidly at the end of each bikingcycle as the user comes to a rest.

If the window is labeled as motion, any periodic signals related to themotion may be ignored by removing them. To do this, the component matrixS may be cross-correlated with the normalized face locations (Equation(2)) for that window.

To remove components that dominantly represent head motion, the rows inthe component matrix S with a correlation greater than 0.5 (e.g.,empirically determined) are discarded from further calculations. Thismotion filtering results in matrix S′. A global threshold for subjectscan consistently reject components associated large motion artifacts. Ifthe window is given a rest label, no components are removed and thecomputation proceeds to the next stage, shown in FIG. 4 as automaticcomponent selection 446.

Periodic head motion may be visually and statistically similar to one ofthe nine components derived from the raw data. The statisticalsimilarity may confuse a peak detection method that relies on aMAP-estimate, causing it to falsely report the highest peak in the powerspectrum as heart rate. Thus, prior knowledge of the head motionfrequency assists in picking the correct heart rate, even if the signalis largely dominated by head-motion-induced changes. Certain commontypes of aperiodic movements also may occur, such as induced whenindividuals scratch their face or turn their head, or performshort-duration body movements.

Component identification benefits from this preprocessing step as itenables unsupervised selection of the heart rate component andeliminates uncertainty associated with the arbitrary component ordering,which is a fundamental property of ICA methods.

With respect to component selection 446 in the exemplifiedimplementation of FIG. 4, heart rate component identification may betreated as a classification and detection problem that can be dividedinto feature extraction and classification Feature extraction derives anumber of features primarily associated with the regularity of thesignal, in that the underlying morphology (and dominant frequency) of apulse waveform can be characterized by the number of regularly-spacedpeaks. This is followed by classification, where a linear classifier orthe like may be employed to estimate each candidate component'slikelihood to be a pulse wave. The top two components (chosen for avariety of reasons set forth herein) are utilized for peak detection andheart rate estimation.

With respect to feature extraction, the component classification systemmakes use of a number of features (nine in this example) generallyderived using the autocorrelation of each component. The autocorrelationvalue at a time instant t represents the correlation of the signal witha shifted version of itself (shifted by t seconds). Because the pulsewaveform is reasonably periodic, autocorrelation effectivelydifferentiate these waveforms from noise.

If a signal has dominant periodic trend (of period T), theautocorrelation has high magnitude at shift T. The process computes theautocorrelation of each candidate component in matrix S′, and normalizesthe autocorrelation signal so the value at a shift of zero is one. Foreach of these nine auto-correlations (one for each component), a numberof features (e.g., eight in this example) that were observed as the mostvaluable indicators of regularity are computed.

A first feature is the total number of “prominent” peaks, such as thenumber of peaks greater than a static threshold (e.g., 0.2, set based onpreliminary experiments) and located at least a threshold shift awayfrom the neighboring peaks (0.33 seconds). FIGS. 6A-6C represent some ofthe feature extraction concepts; FIG. 6A shows a noise component, FIG.6B an ambiguous component, and FIG. 6C a true heart rate waveform.

More particularly, FIGS. 6A-6C represent feature properties for datawithin a single time window selected from training data. Theautocorrelation waveforms (solid lines) from the three selectedcomponents (dashed lines) each represent different autocorrelationproperties/characteristics of the selected features that are used by theclassifier.

The autocorrelation in FIG. 6C is labeled to highlight some of thefeatures used by the classifier to label this component as heart rate.In the example of FIG. 6C, it is seen that the magnitude of the firstpeak 662 is greater than or equal to 0.2, and that the number of “best”peaks (greater than or equal to 0.2, represented by a dot at the top ofeach such peak) is seven. In this example, the minimum peak-to-peak lag,represented by arrow 664, is greater than or equal to 0.33 seconds. Themean and variance of the peak-to-peak lags are represented via thearrows labeled 666. The threshold for minimum spacing (FIG. 6A-6C) maybe chosen based on the maximum reasonable heart rate for a healthy user(e.g., 180 beats per minute). Note that peaks occurring closer than thethreshold may not be characteristic of a regular pulse waveform.

A second feature is the magnitude of the first “prominent” peak,excluding the initial peak, at zero lag, which is always equal to one.Periodic signals yield a higher value for this feature (FIG. 6C).

A third feature is computed as the product of the first two features,and helps resolve ambiguous cases where the highest peaks in twodifferent candidate components have equal magnitude and lag (see e.g.,FIG. 6B versus FIG. 6C).

Other features include the mean and variance of peak-to-peak spacing(another measure of the periodicity of the signal), log entropy of thepower spectrum of the autocorrelation (high entropy suggests multipledominant frequencies), the first prominent peak's lag, and the totalnumber of positive peaks.

Another feature, not derived from the autocorrelation, is the kurtosisof the time-domain component signal. This is primarily a measure of hownon-Gaussian the signal is in terms of its probability distribution,that is, the “peaky-ness” of a discrete signal, similar to some of theautocorrelation features. The kurtosis values of each component in S′are combined with the eight autocorrelation features in this example toprovide the nine features.

Turning to classification, to determine which component out of the nineestimated components is most likely to contain the heart rate estimate,a classifier may be used, e.g., a linear classifier (regression model).The training data comprised ten-second sliding windows (one-second step)with nine candidate components estimated in each window. The traininglabels (binary) were assigned in a supervised manner by comparing theground truth heart rate (optical pulse sensor waveform) with eachcomponent. Any component where the highest power spectrum peak waslocated within ±2 beats per minute (bpm) of the actual heart rate wasassigned a positive label.

For each window in the test datasets, the feature matrix (of size ninefeatures by nine components) is estimated and used with the classifierto obtain a binary label and a posteriori decision value a for eachcomponent. A signal-quality-driven peak detection approach, describedherein, is applied to the best two components (the two highest a values)to estimate heart rate.

For heart rate estimation, the classifier provides confidence values foreach ICA component to narrow in on the candidate component most likelyto contain the pulse signal. Typically, multiple components areclassified as likely heart rate candidates due to their heart rate-likeautocorrelation feature values; this is particularly true with periodicmotion, such as during exercise (even after motion filtering). In thisexample implementation, the process uses two signal quality metrics thatreduce ambiguity in picking the frequency that corresponds to heartrate. In general, after applying such metrics in this example asdescribed below, the highest peak in the power spectrum of the componentselected by the metrics is reported as the estimated heart rate, h(t).

A first metric is the confidence value a provided by the classifier. Thenine components are sorted based on this value with the highest k (e.g.,two) chosen for further processing in the frequency domain.

A second metric is based on the power spectrum of each selectedcomponent. For each of these k components, the process estimates thepower spectrum obtains the highest two peak locations and theirmagnitudes (within the window of 0.75-3 Hz, corresponding to 45-180bpm). The peak magnitudes n₁ and n₂ are further used to estimate thespectral peak confidence (β) for each component as β₁=1−n₂/n₁ where idenotes the sorted component index (1 or 2, with α₁≧α₂) and peakmagnitudes n₁≧n₂.

Spectral peak confidence is a good measure of the fitness of thecomponent. FIG. 7A shows examples of power spectra from examplecomponents that illustrate a wide range of corresponding values of peakconfidence β. As shown in the examples labeled 770, 772 and 744, thelarger the differences of the peaks' magnitudes, the closer β is to one(1), e.g., example 770, whereas nearly equal magnitudes force β closerto zero, e.g., example 774) The peak confidences may be sorted todetermine the index that is more likely to contain a clean peak signal.Note that this metric is not necessary when a single candidate componentis labeled by the classifier, in which case the highest peak for thiscomponent is reported.

FIG. 7B shows an example where α₁=0.83≧α₂=0.75 (as determined by theclassifier), but the second heart rate component is selected over thefirst component based on β₂=0.82≧β₁=0.19, that is, the β metricdisagrees with the classifier output. A reason for developing a peakquality metric such as β is to avoid detection errors due tolow-frequency noise. In FIG. 7B, the actual component (the dashed linewith peak 776 (α₂=0.75, β₂=0.82)) is labeled by the classifier as thesecond-best component relative to other component (the solid line withpeak 778 (α₁=0.83, β₁=0.19)), which may result in a poor heart rateestimate without the application of the peak confidence β. In practicethis metric is useful in cases where the proposed motion filteringapproach was unable to completely remove the noise due to periodicintensity changes. Note that it is alternatively feasible to include βas a feature for the classifier.

In this particular example, determining the final heart rate comprises aconfidence-based weighting. In a real world scenario, there are multiplesources of noise (short and/or long duration), other than exercise-typemotion that may corrupt the signal due to large intensity changes. Someof these may include camera noise, flickering lights, talking,head-nodding, laughing, yawning, observing the environment, andface-occluding gestures. To address such noise, the decision value a(from the classifier) may be used as a signal quality index to weightthe current heart rate estimate before reporting it. For example, thefinal reported heart rate value h′(t) may be estimated using theprevious heart rate h(t−1) and the current estimated heart rate h(t):

h′(t)=αh(t)+(1−α)h(t−1).  (4)

The weighting presented here assists in minimizing large errors when thedecision values are not high enough to indicate excellent signalquality. This model also plays a role in keeping track of the mostrecent stable heart rate in a continuous-monitoring scenario with orwithout motion artifacts. Note that performance of such a predictionmodel is largely dependent on the current window's estimate and theweight. At the end of this example process, a final heart rate h′(t) iscomputed for each ten second overlapping window in a video sequence.

FIGS. 8 and 9 comprise a flow diagram summarizing various aspects of thetechnology described herein, beginning at step 802 which representscapturing signals and motion data for a time window. The signals may beobtained from a plurality of regions of interest. As is understood, thesteps of FIGS. 8 and 9 may be repeated for each time window.

Step 804 represents computing the ICA or other transform from thesignals. Step 806 processes the (e.g., transformed) signal data into thesignal-based features described above.

Step 808 represents computing the motion data-based features. Note thatthis is used in alternatives in which the classifier is trained withmotion data. It is alternatively feasible to use the motion data inother ways, e.g., to remove peak signals or lower confidence scores ofpeak signals based upon alignment with motion data, and so on.

Step 810 represents computing any other features that may be used inclassification. These may include some or all of the (non-limiting)examples enumerated above, e.g., light information, distance data,activity level, demographic information, environmental data(temperature, humidity), visual properties and so on.

Step 812 feeds the computed feature data into the classifier, which inturn classifies the signals with respect to their quality as pulsecandidates, e.g., each with a confidence score. The top k (e.g., two)candidates are selected from the classifier provided confidence scoresat step 814. The exemplified steps continue in FIG. 9.

Step 902 of FIG. 9 represents estimating the spectral peak confidencefor each candidate, e.g., the β value computed based upon the magnitudesof the two highest peaks. Step 904 represents sorting the top kcandidates by their peak confidence values.

Step 906 represents the smoothing operation. As described above, thismay be based upon the previous value and the confidence score of thecurrent value (e.g., equation (4)), and/or via another smoothingtechnique such as dynamic programming. Step 908 outputs the heart rateas modified by any smoothing in this example.

As can be seen, there is described a technology in which video-basedheart rate measurements are more accurate and robust than previoustechniques, including via sensing multiple regions of interest, motionfiltering and/or automatic component selection to identify and processcandidate waveforms for pulse estimation. Classification may be used toprovide top candidates, which may be combined with other confidencemetrics and/or temporal smoothing to produce a final heart rate per timewindow.

One or more aspects are directed towards computing pulse informationfrom video signals of a subject captured by a camera over a time window,including processing signal data that contains the pulse information andthat corresponds to at least one region of interest of the subject. Thepulse information is extracted from the signal data, including by usingmotion data to reduce or eliminate effects of motion within the signaldata. In one or more aspects, at least some of the motion data may beobtained from the video signals and/or from an external motion sensor.

Processing the signal data may comprise inputting the signal data andthe motion data into a classifier, and receiving a signal qualityestimation from the classifier. The signal quality estimation may beused to determine one or more candidate signals for extracting the pulseinformation. Processing the signal data may comprise processing aplurality of signals corresponding to a plurality of regions of interestand/or corresponding to a plurality of component signals. Processing thesignal data may comprise performing a transformation on the videosignals.

Heart rate data may be computed from the pulse information, and used tooutput a heart rate value based upon the heart rate data. This mayinclude smoothing the heart rate data into the heart rate value based atleast in part upon prior heart rate data, a confidence score, and/ordynamic programming.

One or more aspects include a signal quality estimator that isconfigured to receive candidate signals corresponding to a plurality ofcaptured video signals of a subject. For each candidate signal, thesignal quality estimator determines a signal quality value that is basedat least in part upon the candidate signal's resemblance to pulseinformation. A heart rate extractor is configured to compute heart ratedata corresponding to an estimated heart rate of the subject based atleast in part upon the quality values.

A transform may be used to transform the captured video signals into thecandidate signals. A motion suppressor may be coupled to or incorporatedinto the signal quality estimator, including to modify any candidatesignal that is likely affected by motion based upon motion data sensedfrom the video signals and/or sensed by one or more external sensors.

The signal quality estimator may incorporate or be coupled to amachine-learned classifier, in which signal feature data correspondingto the candidate signals is provided to the classifier to obtain thequality values. Other feature data provided to the classifier mayinclude motion data, light information, previous heart rate data,distance data, activity data, demographic information, environmentaldata, and/or data based upon visual properties.

The heart rate extractor may compute the data corresponding to a heartrate of the subject by selection of a number of selected candidatesignals according to the quality values, and by choosing one of theselected candidate signals as representing pulse information based uponrelationships of at least two peaks within each of the selectedcandidate signals. A heart rate smoothing component may be coupled to orincorporated into the heart rate extractor to smooth the heart rate datainto a heart rate value based upon confidence data and/or prior heartrate data.

One or more aspects are directed towards providing sets of feature datato a classifier, each set of feature data including feature datacorresponding to video data of a subject captured at one of a pluralityof regions of interest. Quality data is received from the classifier foreach set of feature data, the quality data providing a measure of pulseinformation quality represented by the feature data. Pulse informationis extracted from video signal data corresponding to the video data ofthe subject, including by using the quality data to select the videosignal data. Providing the sets of feature data to the classifier mayinclude providing motion data as part of the feature data for each set.Heart rate data may be computed from the pulse information, to output aheart rate value based upon the heart rate data.

Example Operating Environment

It can be readily appreciated that the above-described implementationand its alternatives may be implemented on any suitable computing deviceor similar machine logic, including a gaming system, personal computer,tablet, DVR, set-top box, smartphone, standalone device and/or the like.Combinations of such devices are also feasible when multiple suchdevices are linked together. For purposes of description, a gaming(including media) system is described as one example operatingenvironment hereinafter. However, it is understood that any or all ofthe components or the like described herein may be implemented instorage devices as executable code, and/or in hardware/hardware logic,whether local in one or more closely coupled devices or remote (e.g., inthe cloud), or a combination of local and remote components, and so on.

FIG. 10 is a functional block diagram of an example gaming and mediasystem 1000 and shows functional components in more detail. Console 1001has a central processing unit (CPU) 1002, and a memory controller 1003that facilitates processor access to various types of memory, includinga flash Read Only Memory (ROM) 1004, a Random Access Memory (RAM) 1006,a hard disk drive 1008, and portable media drive 1009. In oneimplementation, the CPU 1002 includes a level 1 cache 1010, and a level2 cache 1012 to temporarily store data and hence reduce the number ofmemory access cycles made to the hard drive, thereby improvingprocessing speed and throughput.

The CPU 1002, the memory controller 1003, and various memory devices areinterconnected via one or more buses (not shown). The details of the busthat is used in this implementation are not particularly relevant tounderstanding the subject matter of interest being discussed herein.However, it will be understood that such a bus may include one or moreof serial and parallel buses, a memory bus, a peripheral bus, and aprocessor or local bus, using any of a variety of bus architectures. Byway of example, such architectures can include an Industry StandardArchitecture (ISA) bus, a Micro Channel Architecture (MCA) bus, anEnhanced ISA (EISA) bus, a Video Electronics Standards Association(VESA) local bus, and a Peripheral Component Interconnects (PCI) busalso known as a Mezzanine bus.

In one implementation, the CPU 1002, the memory controller 1003, the ROM1004, and the RAM 1006 are integrated onto a common module 1014. In thisimplementation, the ROM 1004 is configured as a flash ROM that isconnected to the memory controller 1003 via a Peripheral ComponentInterconnect (PCI) bus or the like and a ROM bus or the like (neither ofwhich are shown). The RAM 1006 may be configured as multiple Double DataRate Synchronous Dynamic RAM (DDR SDRAM) modules that are independentlycontrolled by the memory controller 1003 via separate buses (not shown).The hard disk drive 1008 and the portable media drive 1009 are shownconnected to the memory controller 1003 via the PCI bus and an ATAttachment (ATA) bus 1016. However, in other implementations, dedicateddata bus structures of different types can also be applied in thealternative.

A three-dimensional graphics processing unit 1020 and a video encoder1022 form a video processing pipeline for high speed and high resolution(e.g., High Definition) graphics processing. Data are carried from thegraphics processing unit 1020 to the video encoder 1022 via a digitalvideo bus (not shown). An audio processing unit 1024 and an audio codec(coder/decoder) 1026 form a corresponding audio processing pipeline formulti-channel audio processing of various digital audio formats. Audiodata are carried between the audio processing unit 1024 and the audiocodec 1026 via a communication link (not shown). The video and audioprocessing pipelines output data to an A/V (audio/video) port 1028 fortransmission to a television or other display/speakers. In theillustrated implementation, the video and audio processing components1020, 1022, 1024, 1026 and 1028 are mounted on the module 1014.

FIG. 10 shows the module 1014 including a USB host controller 1030 and anetwork interface (NW I/F) 1032, which may include wired and/or wirelesscomponents. The USB host controller 1030 is shown in communication withthe CPU 1002 and the memory controller 1003 via a bus (e.g., PCI bus)and serves as host for peripheral controllers 1034. The networkinterface 1032 provides access to a network (e.g., Internet, homenetwork, etc.) and may be any of a wide variety of various wire orwireless interface components including an Ethernet card or interfacemodule, a modem, a Bluetooth module, a cable modem, and the like.

In the example implementation depicted in FIG. 10, the console 1001includes a controller support subassembly 1040, for supporting at leastfour game controllers 1041(1)-1041(4). The controller supportsubassembly 1040 includes any hardware and software components needed tosupport wired and/or wireless operation with an external control device,such as for example, a media and game controller. A front panel I/Osubassembly 1042 supports the multiple functionalities of a power button1043, an eject button 1044, as well as any other buttons and any LEDs(light emitting diodes) or other indicators exposed on the outer surfaceof the console 1001. The subassemblies 1040 and 1042 are incommunication with the module 1014 via one or more cable assemblies 1046or the like. In other implementations, the console 1001 can includeadditional controller subassemblies. The illustrated implementation alsoshows an optical I/O interface 1048 that is configured to send andreceive signals (e.g., from a remote control 1049) that can becommunicated to the module 1014.

Memory units (MUs) 1050(1) and 1050(2) are illustrated as beingconnectable to MU ports “A” 1052(1) and “B” 1052(2), respectively. EachMU 1050 offers additional storage on which games, game parameters, andother data may be stored. In some implementations, the other data caninclude one or more of a digital game component, an executable gamingapplication, an instruction set for expanding a gaming application, anda media file. When inserted into the console 1001, each MU 1050 can beaccessed by the memory controller 1003.

A system power supply module 1054 provides power to the components ofthe gaming system 1000. A fan 1056 cools the circuitry within theconsole 1001.

An application 1060 comprising machine instructions is typically storedon the hard disk drive 1008. When the console 1001 is powered on,various portions of the application 1060 are loaded into the RAM 1006,and/or the caches 1010 and 1012, for execution on the CPU 1002. Ingeneral, the application 1060 can include one or more program modulesfor performing various display functions, such as controlling dialogscreens for presentation on a display (e.g., high definition monitor),controlling transactions based on user inputs and controlling datatransmission and reception between the console 1001 and externallyconnected devices.

As represented via block 1070, a camera (including visible, IR and/ordepth cameras) and/or other sensors, such as a microphone, externalmotion sensor and so forth may be coupled to the system 1000 via asuitable interface 1072. As shown in FIG. 10, this may be via a USBconnection or the like, however it is understood that at least some ofthese kinds of sensors may be built into the system 1000.

The gaming system 1000 may be operated as a standalone system byconnecting the system to high definition monitor, a television, a videoprojector, or other display device. In this standalone mode, the gamingsystem 1000 enables one or more players to play games, or enjoy digitalmedia, e.g., by watching movies, or listening to music. However, withthe integration of broadband connectivity made available through thenetwork interface 1032, gaming system 1000 may further be operated as aparticipating component in a larger network gaming community or system.

CONCLUSION

While the invention is susceptible to various modifications andalternative constructions, certain illustrated embodiments thereof areshown in the drawings and have been described above in detail. It shouldbe understood, however, that there is no intention to limit theinvention to the specific forms disclosed, but on the contrary, theintention is to cover all modifications, alternative constructions, andequivalents falling within the spirit and scope of the invention.

What is claimed is:
 1. A method comprising, computing pulse informationfrom video signals of a subject captured by a camera over a time window,including processing signal data that contains the pulse information andthat corresponds to at least one region of interest of the subject, andextracting the pulse information from the signal data, including usingmotion data to reduce or eliminate effects of motion within the signaldata.
 2. The method of claim 1 wherein processing the signal datacomprises inputting the signal data and the motion data into aclassifier, and further comprising, receiving a signal qualityestimation from the classifier, and using the signal quality estimationto determine one or more candidate signals for extracting the pulseinformation.
 3. The method of claim 1 wherein processing the signal datacomprises processing a plurality of signals corresponding to a pluralityof regions of interest, or processing a plurality of signalscorresponding to a plurality of component signals, or both processing aplurality of signals corresponding to a plurality of regions of interestand processing a plurality of signals corresponding to a plurality ofcomponent signals.
 4. The method of claim 1 wherein extracting the pulseinformation from the signal data comprises extracting feature data. 5.The method of claim 1 wherein extracting the feature data comprisesdetermining feature data corresponding to at least one of:autocorrelation data, spectral entropy data, motion data, lightinformation, previous heart rate data, distance data, activity data,demographic information, environmental data, or data based upon visualproperties.
 6. The method of claim 1 further comprising, obtaining atleast some of the motion data from the video signals.
 7. The method ofclaim 1 further comprising, obtaining at least some of the motion datafrom an external motion sensor.
 8. The method of claim 1 wherein thepulse information corresponds to heart rate data, and furthercomprising, smoothing the heart rate data based at least in part uponprior heart rate data.
 9. The method of claim 1 wherein the pulseinformation corresponds to heart rate data, and further comprising,smoothing the heart rate data based at least in part upon a confidencescore.
 10. The method of claim 1 wherein the pulse informationcorresponds to heart rate data, and further comprising, smoothing theheart rate data based at least in part upon dynamic programming.
 11. Asystem comprising: a signal quality estimator, the signal qualityestimator configured to receive candidate signals corresponding to aplurality of captured video signals of a subject, and for each candidatesignal, the signal quality estimator further configured to determine asignal quality value that is based at least in part upon feature dataextracted from candidate signal, and a heart rate extractor, the heartrate extractor configured to compute heart rate data corresponding to anestimated heart rate of the subject based at least in part upon thequality values.
 12. The system of claim 11 further comprising a motionsuppressor coupled to or incorporated into the signal quality estimator,the motion suppressor configured to modify any candidate signal that islikely affected by motion based upon motion data sensed from the videosignals or sensed by one or more external sensors, or both sensed fromthe video signals and sensed by one or more external sensors.
 13. Thesystem of claim 11 wherein the feature data correspond to spectralentropy data or autocorrelation data, or both.
 14. The system of claim11 wherein the heart rate extractor is configured to compute the datacorresponding to a heart rate of the subject by selection of a number ofselected candidate signals according to the quality values, and tochoose one of the selected candidate signals as representing pulseinformation based upon relationships of at least two peaks within thepower spectrum of each of the selected candidate signals.
 15. The systemof claim 11 wherein the signal quality estimator incorporates or iscoupled to a machine-learned classifier, in which signal feature datacorresponding to the candidate signals is provided to the classifier toobtain the quality values.
 16. The system of claim 15 further comprisingother feature data provided to the classifier, including feature datacorresponding to at least one of: motion data, light information,previous heart rate data, distance data, activity data, demographicinformation, environmental data, or data based upon visual properties.17. The system of claim 11 further comprising a heart rate smoothingcomponent coupled to or incorporated into the heart rate extractor, theheart rate smoothing component configured to smooth the heart rate datainto a heart rate value based upon confidence data or prior heart ratedata, or based upon both confidence data and prior heart rate data. 18.One or more machine-readable storage devices or machine logic havingexecutable instructions, which when executed perform steps, comprising:providing sets of feature data to a classifier, each set of feature dataincluding feature data corresponding to video data of a subject capturedat one of a plurality of regions of interest; receiving quality datafrom the classifier for each set of feature data, the quality data foreach set of feature data providing a measure of pulse informationquality represented by the feature data; and extracting pulseinformation from video signal data corresponding to the video data ofthe subject, including using the quality data to select the video signaldata.
 19. The one or more machine-readable storage devices or machinelogic of claim 18 wherein providing the sets of feature data to theclassifier comprises providing motion data as part of the feature datafor each set.
 20. The one or more machine-readable storage devices ormachine logic of claim 18 having further executable instructionscomprising computing heart rate data from the pulse information, andoutputting a heart rate value based upon the heart rate data.